The slow way to generate an ROC curve

Logistic regression (GLM)
Data                 : bbb
Response variable    : buyer
Level                : yes in buyer
Explanatory variables: gender, last, total, child, youth, cook, do_it, reference, art, geog 
Null hyp.: there is no effect of x on buyer
Alt. hyp.: there is an effect of x on buyer

                OR    OR% coefficient std.error z.value p.value    
 (Intercept)                   -2.361     0.049 -47.891  < .001 ***
 gender|M    2.140 114.0%       0.761     0.036  21.272  < .001 ***
 last        0.910  -9.0%      -0.095     0.003 -33.918  < .001 ***
 total       1.001   0.1%       0.001     0.000   5.630  < .001 ***
 child       0.830 -17.0%      -0.186     0.017 -10.775  < .001 ***
 youth       0.893 -10.7%      -0.113     0.026  -4.327  < .001 ***
 cook        0.763 -23.7%      -0.270     0.017 -15.782  < .001 ***
 do_it       0.583 -41.7%      -0.539     0.027 -19.994  < .001 ***
 reference   1.265  26.5%       0.235     0.027   8.837  < .001 ***
 art         3.176 217.6%       1.156     0.022  52.185  < .001 ***
 geog        1.776  77.6%       0.574     0.019  30.824  < .001 ***

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pseudo R-squared: 0.205
Log-likelihood: -12061.106, AIC: 24144.211, BIC: 24241.229
Chi-squared: 6233.253 df(10), p.value < .001 
Nr obs: 50,000 
Logistic regression (GLM)
Data                 : bbb 
Response variable    : buyer 
Level(s)             : yes in buyer 
Explanatory variables: gender, last, total, child, youth, cook, do_it, reference, art, geog 
Interval             : confidence 
Prediction dataset   : bbb 
Rows shown           : 10 of 50,000 

 gender last total child youth cook do_it reference art geog Prediction  2.5% 97.5%
      M   29   357     3     2    2     0         1   0    2      0.020 0.017 0.024
      M   27   138     0     1    0     1         0   0    1      0.017 0.015 0.019
      F   15   172     0     0    2     0         0   0    0      0.016 0.014 0.017
      F    7   272     0     0    0     0         1   0    0      0.077 0.071 0.083
      F   15   149     0     0    1     0         0   0    0      0.020 0.019 0.022
      F    7   113     0     1    0     0         0   0    0      0.047 0.044 0.051
      M   25    15     0     0    0     1         0   0    0      0.011 0.010 0.013
      M    1   238     2     1    2     3         0   0    3      0.087 0.075 0.101
      F    5   418     0     2    3     2         0   3    1      0.391 0.353 0.431
      F   11   123     0     1    0     0         0   0    0      0.033 0.031 0.036

Function to calculate the TPR and TNR at different trade-off values (aka break-even values)

Creating the data for the ROC curve

Plotting the ROC curve

Calculating the TPR and TNR for the break-even point in the BBB case

Interactive version using ggplotly

Confirm plot with pROC package

How does the confusion matrix change at different values of BE?

Probabilistic interpretation of AUC

See https://www.alexejgossmann.com/auc/ for a very nice dicsussion

[1] 0.8117416

Lets compare that result to what we would get with a formal calculation:

[1] 0.8117416

Lets try a sampling approach

[1] 0.816
[1] 0.812

Lets do repeated simulation

[1] 0.81164
LS0tCnBhZ2V0aXRsZTogTm90ZWJvb2sgcmVwb3J0Cm91dHB1dDoKICBodG1sX25vdGVib29rOgogICAgaGlnaGxpZ2h0OiB6ZW5idXJuCiAgICB0aGVtZTogY29zbW8KICAgIHRvYzogeWVzCiAgICBjb2RlX2ZvbGRpbmc6IGhpZGUKLS0tCgpgYGB7ciByX3NldHVwLCBpbmNsdWRlID0gRkFMU0V9CiMjIGluaXRpYWwgc2V0dGluZ3MKa25pdHI6Om9wdHNfY2h1bmskc2V0KAogIGNvbW1lbnQgPSBOQSwKICBlY2hvID0gVFJVRSwKICBlcnJvciA9IFRSVUUsCiAgY2FjaGUgPSBGQUxTRSwKICBtZXNzYWdlID0gRkFMU0UsCgogIGRwaSA9IDk2LAogIHdhcm5pbmcgPSBGQUxTRQopCgojIyB3aWR0aCB0byB1c2Ugd2hlbiBwcmludGluZyB0YWJsZXMgZXRjLgpvcHRpb25zKAogIHdpZHRoID0gMjUwLAogIHNjaXBlbiA9IDEwMCwKICBtYXgucHJpbnQgPSA1MDAwLAogIHN0cmluZ3NBc0ZhY3RvcnMgPSBGQUxTRQopCgojIyBtYWtlIGFsbCByZXF1aXJlZCBsaWJyYXJpZXMgYXZhaWxhYmxlIGJ5IGxvYWRpbmcgcmFkaWFudCBwYWNrYWdlIGlmIG5lZWRlZAppZiAoaXMubnVsbChzaGlueTo6Z2V0RGVmYXVsdFJlYWN0aXZlRG9tYWluKCkpKSBsaWJyYXJ5KHJhZGlhbnQpCgojIyBpbmNsdWRlIGNvZGUgdG8gbG9hZCB0aGUgZGF0YSB5b3UgcmVxdWlyZQojIyBmb3IgaW50ZXJhY3RpdmUgdXNlIGF0dGFjaCB0aGUgcl9kYXRhIGVudmlyb25tZW50CiMgYXR0YWNoKHJfZGF0YSkKYGBgCgo8c3R5bGU+Ci5idG4sIC5mb3JtLWNvbnRyb2wsIHByZSwgY29kZSwgcHJlIGNvZGUgewogIGJvcmRlci1yYWRpdXM6IDRweDsKfQoudGFibGUgewogIHdpZHRoOiBhdXRvOwp9CnVsLCBvbCB7CiAgcGFkZGluZy1sZWZ0OiAxOHB4Owp9CmNvZGUsIHByZSwgcHJlIGNvZGUgewogIG92ZXJmbG93OiBhdXRvOwogIHdoaXRlLXNwYWNlOiBwcmU7CiAgd29yZC13cmFwOiBub3JtYWw7Cn0KY29kZSB7CiAgY29sb3I6ICNjNzI1NGU7CiAgYmFja2dyb3VuZC1jb2xvcjogI2Y5ZjJmNDsKfQpwcmUgewogIGJhY2tncm91bmQtY29sb3I6ICNmZmZmZmY7Cn0KPC9zdHlsZT4KCiMjIFRoZSBzbG93IHdheSB0byBnZW5lcmF0ZSBhbiBST0MgY3VydmUKCmBgYHtyfQojIyBMb2FkIGNvbW1hbmRzCmJiYiA8LSByZWFkcjo6cmVhZF9yZHMoImRhdGEvYmJiLnJkcyIpCnJlZ2lzdGVyKCJiYmIiKQpgYGAKCmBgYHtyfQpyZXN1bHQgPC0gbG9naXN0aWMoCiAgYmJiLCAKICBydmFyID0gImJ1eWVyIiwgCiAgZXZhciA9IGMoCiAgICAiZ2VuZGVyIiwgImxhc3QiLCAidG90YWwiLCAiY2hpbGQiLCAieW91dGgiLCAKICAgICJjb29rIiwgImRvX2l0IiwgInJlZmVyZW5jZSIsICJhcnQiLCAiZ2VvZyIKICApLCAKICBsZXYgPSAieWVzIgopCnN1bW1hcnkocmVzdWx0KQpwcmVkIDwtIHByZWRpY3QocmVzdWx0LCBwcmVkX2RhdGEgPSBiYmIpCnByaW50KHByZWQsIG4gPSAxMCkKYmJiIDwtIHN0b3JlKGJiYiwgcHJlZCwgbmFtZSA9ICJwcmVkX2xvZ2l0IikKYGBgCgpGdW5jdGlvbiB0byBjYWxjdWxhdGUgdGhlIFRQUiBhbmQgVE5SIGF0IGRpZmZlcmVudCB0cmFkZS1vZmYgdmFsdWVzIChha2EgYnJlYWstZXZlbiB2YWx1ZXMpCgpgYGB7cn0Kc2xvd19yb2MgPC0gZnVuY3Rpb24ob3V0Y29tZSwgcHJlZCwgY29zdCwgbWFyZ2luKSB7CiAgdGJsIDwtIHRpYmJsZTo6dGliYmxlKAogICAgY29zdCA9IGNvc3QsIG1hcmdpbiA9IG1hcmdpbiwgQkUgPSBjb3N0IC8gbWFyZ2luLAogICAgVFAgPSBOQSwgRlAgPSBOQSwgVE4gPSBOQSwgRk4gPSBOQSwgVE5SID0gTkEsIFRQUiA9IE5BCiAgKSAKICBmb3IgKGkgaW4gc2VxX2Fsb25nKGNvc3QpKSB7CiAgICBCRWkgPC0gYXMubnVtZXJpYyh0YmxbaSwiQkUiXSkKICAgIFRQIDwtIHN1bShwcmVkID4gQkVpICYgb3V0Y29tZSA9PSBUUlVFKQogICAgRlAgPC0gc3VtKHByZWQgPiBCRWkgJiBvdXRjb21lID09IEZBTFNFKQogICAgVE4gPC0gc3VtKHByZWQgPD0gQkVpICYgb3V0Y29tZSA9PSBGQUxTRSkKICAgIEZOIDwtIHN1bShwcmVkIDw9IEJFaSAmIG91dGNvbWUgPT0gVFJVRSkKICAgIFRQUiA8LSBUUCAvIChUUCArIEZOKQogICAgVE5SIDwtIFROIC8gKFROICsgRlApCiAgICB0YmxbaSwgNDo5XSA8LSBjKFRQLCBGUCwgVE4sIEZOLCBUTlIsIFRQUikKICB9CiAgdGJsCn0KYGBgCgpDcmVhdGluZyB0aGUgZGF0YSBmb3IgdGhlIFJPQyBjdXJ2ZQoKYGBge3J9Cm91dGNvbWUgPC0gYmJiJGJ1eWVyID09ICJ5ZXMiCnByZWQgPC0gYmJiJHByZWRfbG9naXQKcm9jX2RhdGEgPC0gc2xvd19yb2MoCiAgb3V0Y29tZSwgcHJlZCwgCiAgY29zdCA9IHNlcSgwLCA2LCAwLjA1KSwgbWFyZ2luID0gNgopCnJlZ2lzdGVyKCJyb2NfZGF0YSIpCmBgYAoKUGxvdHRpbmcgdGhlIFJPQyBjdXJ2ZQoKYGBge3IgZmlnLmhlaWdodCA9IDQsIGZpZy53aWR0aCA9IDQsIGRwaSA9IDE0NH0KZ2dwbG90KHJvY19kYXRhLCBhZXMoeCA9IFROUiwgeSA9IFRQUikpICsKICBnZW9tX2xpbmUoKSArCiAgc2NhbGVfeF9yZXZlcnNlKCkgKwogIGdlb21fYWJsaW5lKGludGVyY2VwdCA9IDEsIHNsb3BlID0gMSwgbGluZXR5cGUgPSAiZGFzaGVkIikgKwogIGxhYnMoeCA9ICJUTlIgKFNwZWNpZmljaXR5KSIsIHkgPSAiVFBSIChTZW5zaXRpdml0eSkiKQpgYGAKCkNhbGN1bGF0aW5nIHRoZSBUUFIgYW5kIFROUiBmb3IgdGhlIGJyZWFrLWV2ZW4gcG9pbnQgaW4gdGhlIEJCQiBjYXNlCgpgYGB7cn0KYmJiX3RvIDwtIHNsb3dfcm9jKG91dGNvbWUsIHByZWQsIGNvc3QgPSAwLjUsIG1hcmdpbiA9IDYpCnJlZ2lzdGVyKCJiYmJfdG8iKQpgYGAKCmBgYHtyIGZpZy5oZWlnaHQgPSA0LCBmaWcud2lkdGggPSA0LCBkcGkgPSAxNDR9CnAgPC0gZ2dwbG90KHJvY19kYXRhLCBhZXMoeCA9IFROUiwgeSA9IFRQUikpICsKICBnZW9tX2xpbmUoKSArCiAgc2NhbGVfeF9yZXZlcnNlKCkgKwogIGdlb21fYWJsaW5lKGludGVyY2VwdCA9IDEsIHNsb3BlID0gMSwgbGluZXR5cGUgPSAiZGFzaGVkIikgKwogIGdlb21fcG9pbnQoZGF0YSA9IGJiYl90bywgYWVzKHggPSBUTlIsIHkgPSBUUFIpLCBjb2xvciA9ICJyZWQiLCBzaXplID0gMykgKwogIGxhYnMoeCA9ICJUTlIgKFNwZWNpZmljaXR5KSIsIHkgPSAiVFBSIChTZW5zaXRpdml0eSkiKQpwCmBgYAoKSW50ZXJhY3RpdmUgdmVyc2lvbiB1c2luZyBnZ3Bsb3RseQoKYGBge3J9CnAgPC0gcCArIGdlb21fbGluZSgKICBhZXModGV4dCA9IHBhc3RlKCdjb3N0OicsIGNvc3QsICcsIEJFOicsIHJvdW5kKEJFLCAzKSkpCikgCmdncGxvdGx5KHAsIHRvb2x0aXA9InRleHQiKSAlPiUgcmVuZGVyKCkKYGBgCgoKIyMgQ29uZmlybSBwbG90IHdpdGggcFJPQyBwYWNrYWdlCgpgYGB7ciBmaWcuaGVpZ2h0ID0gNCwgZmlnLndpZHRoID0gNCwgZHBpID0gMTQ0fQpyZXQgPC0gcFJPQzo6cm9jKG91dGNvbWUsIHByZWQpCnBsb3QocmV0KQpgYGAKCiMjIEhvdyBkb2VzIHRoZSBjb25mdXNpb24gbWF0cml4IGNoYW5nZSBhdCBkaWZmZXJlbnQgdmFsdWVzIG9mIEJFPwoKYGBge3IgZmlnLmhlaWdodCA9IDMsIGZpZy53aWR0aCA9IDQsIGRwaSA9IDE0NH0KdmlzdWFsaXplKAogIHJvY19kYXRhLCAKICB4dmFyID0gIkJFIiwgCiAgeXZhciA9IGMoIlRQIiwgIkZQIiwgIlROIiwgIkZOIiksIAogIGNvbWJ5ID0gVFJVRSwgCiAgdHlwZSA9ICJsaW5lIiwgCiAgZGF0YV9maWx0ZXIgPSAiVFAgPiAwIiwgCiAgY3VzdG9tID0gRkFMU0UKKQpgYGAKCgojIyBQcm9iYWJpbGlzdGljIGludGVycHJldGF0aW9uIG9mIEFVQwoKU2VlIDxhIGhyZWY9Imh0dHBzOi8vd3d3LmFsZXhlamdvc3NtYW5uLmNvbS9hdWMvIiB0YXJnZXQ9Il9ibGFuayI+aHR0cHM6Ly93d3cuYWxleGVqZ29zc21hbm4uY29tL2F1Yy88L2E+IGZvciBhIHZlcnkgbmljZSBkaWNzdXNzaW9uCgpgYGB7cn0KIyMgQWRhcHRlZCBmcm9tIEFsZXhlaidzIGNvZGUKcyA8LSAwCmRpZF9idXkgPC0gd2hpY2gob3V0Y29tZSA9PSBUUlVFKQpkaWRfbm90X2J1eSA8LSB3aGljaChvdXRjb21lID09IEZBTFNFKQpmb3IgKGkgaW4gZGlkX2J1eSkgewogIHMgPC0gcyArIHN1bShwcmVkW2ldID4gcHJlZFtkaWRfbm90X2J1eV0pCiAgcyA8LSBzICsgc3VtKHByZWRbaV0gPT0gcHJlZFtkaWRfbm90X2J1eV0pIC8gMgp9CnMgLyAoc3VtKG91dGNvbWUgPT0gVFJVRSkgKiBzdW0ob3V0Y29tZSA9PSBGQUxTRSkpCmBgYAoKTGV0cyBjb21wYXJlIHRoYXQgcmVzdWx0IHRvIHdoYXQgd2Ugd291bGQgZ2V0IHdpdGggYSBmb3JtYWwgY2FsY3VsYXRpb246CgpgYGB7cn0KcmFkaWFudC5tb2RlbDo6YXVjKHByZWQsIG91dGNvbWUpCmBgYAoKTGV0cyB0cnkgYSBzYW1wbGluZyBhcHByb2FjaAoKYGBge3J9Cm5yIDwtIDIwMDAKYGBgCgpgYGB7cn0KbWVhbigKICBwcmVkW3NhbXBsZShkaWRfYnV5LCBucildID4gCiAgcHJlZFtzYW1wbGUoZGlkX25vdF9idXksIG5yKV0KKQpgYGAKCmBgYHtyfQpwcmVkX2RpZF9idXkgPC0gcHJlZFtkaWRfYnV5XSAKcHJlZF9kaWRfbm90X2J1eSA8LSBwcmVkW2RpZF9ub3RfYnV5XSAKYGBgCgpgYGB7cn0KbWVhbigKICBzYW1wbGUocHJlZF9kaWRfYnV5LCBucikgPiAKICBzYW1wbGUocHJlZF9kaWRfbm90X2J1eSwgbnIpCikKYGBgCgpMZXRzIGRvIHJlcGVhdGVkIHNpbXVsYXRpb24KCmBgYHtyfQpzIDwtIHJlcChOQSwgMTAwKQpmb3IgKGkgaW4gc2VxX2Fsb25nKHMpKSB7CiAgc1tpXSA8LSBtZWFuKAogICAgcHJlZFtzYW1wbGUoZGlkX2J1eSwgMjAwMCldID4gCiAgICBwcmVkW3NhbXBsZShkaWRfbm90X2J1eSwgMjAwMCldCiAgKQp9Cm1lYW4ocykKYGBgCgoKCgoKCg==